Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: Case of grammatical inference

نویسندگان

  • Hari Mohan Pandey
  • Ankit Chaudhary
  • Deepti Mehrotra
  • Graham Kendall
چکیده

In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problemwhich has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora. & 2016 Elsevier B.V. All rights reserved. , Accepting positive sample; BMODA, Bit masking oriented data structure; BNF, Backus Naur Form; BBP, Boolean based free grammar; CS, Chromosome size; CM, Crossmask/crossover mask; CR, Crossover rate; DFA, Deterministic finite ific Language; EA, Evolutionary algorithm; EMP, Elite Mating Pool Genetic Algorithm; GI, Grammatical inference; GA, GAW, Genetic Algorithm without Minimum Description Length; GP, Genetic Programming; GA, Genetic algorithm; , Model; MM, Mutmask/mutation mask; MDL, Minimum description length; NN, Neural Network; MA, Memetic Alof allowable grammar rules; PAC, Probably Approximately Correct; PRL, Production rule length; PDA, Pushdown Network; RNS, Rejecting negative sample; RPS, Rejecting positive sample; RL, Regular language; SOM, Self-organizing entation Algorithm

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inferring gene regulatory networks from time series data using the minimum description length principle

MOTIVATION A central question in reverse engineering of genetic networks consists in determining the dependencies and regulating relationships among genes. This paper addresses the problem of inferring genetic regulatory networks from time-series gene-expression profiles. By adopting a probabilistic modeling framework compatible with the family of models represented by dynamic Bayesian networks...

متن کامل

A Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study

A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...

متن کامل

Collaboration space division in collaborative product development based on a genetic algorithm

The advance in the global environment, rapidly changing markets, and information technology has created a new stage for design. In such an environment, one strategy for success is the Collaborative Product Development (CPD). Organizing people effectively is the goal of Collaborative Product Development, and it solves the problem with certain foreseeability. The development group activities are ...

متن کامل

Modeling of measurement error in refractive index determination of fuel cell using neural network and genetic algorithm

Abstract: In this paper, a method for determination of refractive index in membrane of fuel cell on basis of three-longitudinal-mode laser heterodyne interferometer is presented. The optical path difference between the target and reference paths is fixed and phase shift is then calculated in terms of refractive index shift. The measurement accuracy of this system is limited by nonlinearity erro...

متن کامل

Evolving Stochastic Context-Free Grammars from Examples Using a Minimum Description Length Principle

This paper describes an evolutionary approach to the problem of inferring stochastic context-free grammars from nite language samples. The approach employs a genetic algorithm, with a tness function derived from a minimum description length principle. Solutions to the inference problem are evolved by optimizing the parameters of a covering grammar for a given language sample. We provide details...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Swarm and Evolutionary Computation

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2016